Similarity Measures for Nominal Variable Clustering
نویسنده
چکیده
The paper deals with selected similarity measures which can be used for hierarchical clustering of nominal variables. These variables are commonly used in questionnaire surveys. Cluster analysis can be applied in case a reduction of a dataset size is welcomed. In this paper, there are examined several similarity measures for nominal variable clustering, which have been introduced in recent years. On the contrary to the simple matching coefficient, which is considered to be a basic similarity measure, they take into account more characteristics regarding the dataset, such as distribution of frequencies of categories. Therefore, they should provide better results in a comparison to the simple matching coefficient. The performance of clustering with selected similarity measures is examined on two real datasets. For cluster quality evaluation, indices based on the within-cluster variability have been chosen. All computations have been performed in the statistical systems Matlab, IBM SPSS Statistics and MS Excel.
منابع مشابه
An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملTaxonomy of Nominal Type Histogram Distance Measures
Abstract: Distance or similarity measures are of fundamental importance to pattern classification, clustering, and information retrieval problems. Various distance/similarity measures that are applicable to compare two nominal type histograms are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted t...
متن کاملCluster Analysis of Economic Data
In the paper, some classical and recent approaches to cluster analysis are discussed. Over the last decades researchers focused mainly on categorical data clustering, uncertainty in cluster analysis and clustering large data sets. In this paper some of the recently proposed techniques are introduced, such as similarity measures for data files with nominal variables, algorithms which include unc...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کامل